Extending the guide with multiple LoRAs serving with BBR #1859

davidbreitgand · 2025-11-13T18:43:09Z

kind/documentation
Closes #1858

Extends documentation

…serve multiple LoRAs (many LoRAs per one model while having multiple models)

netlify · 2025-11-13T18:43:15Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`300fbbe`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/691626c07f41750008234c95
😎 Deploy Preview	https://deploy-preview-1859--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-11-13T18:43:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: davidbreitgand
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-11-13T18:43:19Z

Hi @davidbreitgand. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

elevran · 2025-11-17T13:18:34Z

/ok-to-test

nirrozenbaum · 2025-11-19T12:20:56Z

config/manifests/bbr-example/httproute_bbr_lora.yaml

+        value: /
+      headers:
+        - type: Exact
+          #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.


can we remove this comment from the yaml?
theoretically for testing the HttpRoute functionality one can inject the header manually.
this has nothing to do with BBR, which is just an implementation detail and one way to inject the model name header.

Suggested change

#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.

nirrozenbaum · 2025-11-19T12:21:34Z

config/manifests/bbr-example/httproute_bbr_lora.yaml

+        value: /
+      headers:
+        - type: Exact
+          #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.


ditto

Suggested change

#Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.

nirrozenbaum · 2025-11-19T12:23:47Z

config/manifests/bbr-example/httproute_bbr_lora.yaml

+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+  name: vllm-llama3-8b-instruct-lora-food-review-1 #give this HTTPRoute any name that helps you to group and track the routes
+spec:
+  parentRefs:
+  - group: gateway.networking.k8s.io
+    kind: Gateway
+    name: inference-gateway
+  rules:
+  - backendRefs:
+    - group: inference.networking.k8s.io
+      kind: InferencePool
+      name: vllm-llama3-8b-instruct
+    matches:
+    - path:
+        type: PathPrefix
+        value: /
+      headers:
+        - type: Exact
+          #Body-Based routing(https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/bbr/README.md) is being used to copy the model name from the request body to the header.
+          name: X-Gateway-Model-Name 
+          value: 'food-review-1'   #this is the name of LoRA as defined in vLLM deployment
+    timeouts:
+      request: 300s


why this is not part of the first HttpRoute llm-llama-route that maps to InferencePool vllm-llama3-8b-instruct?

nirrozenbaum · 2025-11-19T13:30:20Z

config/manifests/vllm/sim-deployment-1.yaml

+        - --max-loras
+        - "2"
+        - --lora-modules
+        - '{"name": "food-review"}'


maybe we can call the lora adapters with completely different names to avoid confusion?
(the original deployment has food-review-1).

elevran · 2025-11-19T15:43:59Z

@davidbreitgand minor addition (letting @nirrozenbaum drive the review)
please consider changing the documentation comment on the PR's description to better reflect the change from a user's perspective.

kfswain · 2025-11-20T00:57:29Z

site-src/guides/serve-multiple-genai-models.md

+### Serving multiple LoRAs per base AI model
+
+<div style="border: 1px solid red; padding: 10px; border-radius: 5px;">
+⚠️ Known Limitation : LoRA names must be unique across the base AI models (i.e., across the backend inference server deployments)


Known limitation almost implies its wrong in some way... can we just drop the limitation part?

kfswain · 2025-11-20T01:17:32Z

site-src/guides/serve-multiple-genai-models.md

+<div style="border: 1px solid red; padding: 10px; border-radius: 5px;">
+⚠️ Known Limitation : 
+
+[Kubernetes API Gateway limits the total number of matchers per HTTPRoute to be less than 128](https://github.com/kubernetes-sigs/gateway-api/blob/df8c96c254e1ac6d5f5e0d70617f36143723d479/apis/v1/httproute_types.go#L128).


This link isnt workin in the preview
https://deploy-preview-1859--gateway-api-inference-extension.netlify.app/guides/serve-multiple-genai-models/

kfswain · 2025-11-20T01:18:22Z

site-src/guides/serve-multiple-genai-models.md

+        ```
+
+	2. Send a few requests to the LoRA of the Llama model as follows:
+       ```bash


formatting is strange here in the preview also.

kfswain · 2025-11-20T01:18:36Z

site-src/guides/serve-multiple-genai-models.md

+          }'
+        ```
+
+	2. Send a few requests to the LoRA of the Llama model as follows:


suggest just using 1. for all ordered list entries

Extending serving multiple AI models guide with an example of how to …

300fbbe

…serve multiple LoRAs (many LoRAs per one model while having multiple models)

k8s-ci-robot requested review from ahg-g and kfswain November 13, 2025 18:43

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 13, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 13, 2025

davidbreitgand mentioned this pull request Nov 13, 2025

Document example for multiple LoRA support llm-d/llm-d-inference-scheduler#415

Open

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 17, 2025

nirrozenbaum reviewed Nov 19, 2025

View reviewed changes

kfswain reviewed Nov 20, 2025

View reviewed changes

Extending the guide with multiple LoRAs serving with BBR #1859

Are you sure you want to change the base?

Extending the guide with multiple LoRAs serving with BBR #1859

Conversation

davidbreitgand commented Nov 13, 2025

Uh oh!

netlify bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Nov 13, 2025

Uh oh!

k8s-ci-robot commented Nov 13, 2025

Uh oh!

elevran commented Nov 17, 2025

Uh oh!

nirrozenbaum Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

elevran commented Nov 19, 2025

Uh oh!

kfswain Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

kfswain Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

kfswain Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

kfswain Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

netlify bot commented Nov 13, 2025 •

edited

Loading

nirrozenbaum Nov 19, 2025 •

edited

Loading